use endpoints discovery to find all available control plane nodes #273
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
Currently, ContainerInsights uses a static target discovery mechanism when initializing Prometheus scraping configuration for the control plane. This approach leverages a cluster IP for the k8s service that points to the control plane nodes, causing traffic to be load balanced to only one available control plane node. As a result, the agent only fetches data from a single node, leading to incomplete metrics collection.
This PR addresses the issue by updating the Prometheus configuration for the control plane to use endpoint discovery instead. This native Prometheus mechanism queries endpoints (e.g.,
kubernetes
endpoint in thedefault
namespace for the control plane) and creates targets for each IP associated with the endpoint (per port). Consequently, the agent will scrape control plane metrics from all nodes. Additionally, in the event of a control plane scale-up, endpoint discovery will automatically detect newly added nodes and begin scraping metrics from them as new targets.Changes:
staticConfig
to k8s endpoints discovery in scraping configuration for CPClusterName
Sources
NodeName
Type
andVersion
Testing:
Attached screenshot demonstrating that the sum of the
apiserver_request_total
metrics scraped by the agent now matches the aggregated values reported by EKS vended metrics. This confirms that our new endpoint discovery mechanism is correctly capturing data from all control plane nodes.Documentation: